Coarse Grained FPGA Overlay for Rapid Just-In-Time Accelerator Compilation

نویسندگان

چکیده

Coarse-grained FPGA overlays built around the runtime programmable DSP blocks in modern FPGAs can achieve high throughput and improved scalability compared to traditional without detailed consideration of architecture. These be mapped using higher level compilers, achieving fast compilation, software-like programmability run-time management, high-level design abstraction. OpenCL allows programs running on a host computer launch accelerator kernels which compiled at for specific architecture, thus enabling portability. However, prohibitive hardware compilation times flows mean that tools cannot effectively use just-in-time (JIT) or performance scaling FPGAs. We present methodology dataflow graphs expressed as onto coarse-grained overlays. The benefits from abstraction afforded by programming model, while mapping overlay significantly reduces load times. Key characteristics this work include highly performant DSP-optimized functional units scale large devices ability perform automatic resource-aware kernel replication up size overlay. demonstrate place route orders magnitude better than HLS flows, even when an embedded processor Xilinx Zynq.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resource-Aware Just-in-Time OpenCL Compiler for Coarse-Grained FPGA Overlays

FPGA vendors have recently started focusing on OpenCL for FPGAs because of its ability to leverage the parallelism inherent to heterogeneous computing platforms. OpenCL allows programs running on a host computer to launch accelerator kernels which can be compiled at run-time for a specific architecture, thus enabling portability. However, the prohibitive compilation times (specifically the FPGA...

متن کامل

QUKU: A Coarse Grained Paradigm for FPGA

To fill the gap between increasing demand for reconfigurability and performance efficiency, coarse grain reconfigurable architectures are seen to be an emerging platform. The advantage lies in quick dynamic reconfiguration and power efficiency. Despite having these advantages they have failed to show their mark. This paper describes the QUKU architecture, which uses a coarsegrained dynamically ...

متن کامل

Just-in-time Compilation for Generalized Parsing

Parsing syntactically extensible languages requires generalized parsers which are slow to generate for repeatedly changing grammars. This situation is similar to the execution of dynamic languages like JavaScript, suggesting that we can appropriate technology from that field to use in just-in-time compiled parsers. We implement two just-intime compiling grammar interpreters, a simple one and a ...

متن کامل

A Coarse-Grain FPGA Overlay for Executing Data Flow Graphs

We explore the feasibility of using a coarse-grain overlay to transparently and dynamically accelerate the execution of hot segments of code that run on soft processors. The overlay, referred to as the Virtual Dynamically Reconfigurable (VDR), is tuned to realize data flow graphs in which nodes are machine instructions and the edges are inter-instruction dependences. A VDR consists of an array ...

متن کامل

Output Serialization for FPGA-based and Coarse-grained Processor Arrays

This paper deals with the mapping of loop programs onto processor arrays either implemented in an FPGA or available as (reconfigurable) coarse-grained processor architectures. Usually the proportion of processing elements to I/O-interfaces is much higher whereby problems of data transportation and synchronization are arising. In this realm, we propose a systematic approach in order to feed-out ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems

سال: 2022

ISSN: ['1045-9219', '1558-2183', '2161-9883']

DOI: https://doi.org/10.1109/tpds.2021.3116859